{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 搜索算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们现在把注意力转向计算中经常出现的一些问题,即搜索和排序问题。在Python中,有一个非常简单的方法来询问一个项是否在一个项列表中。我们使用in运算符。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "33 in [10,14,19,26,27,31,33,35,42,44]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 这很容易写,一个底层的操作替我们完成这个工作。事实证明,有很多不同的方法来搜索。我们在这里感兴趣的是这些算法如何工作以及它们如何相互比较。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 顺序查找 O(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"image/chapter04/linearSearch.gif\" width=500>\n",
    "\n",
    "当数据项存储在诸如列表的集合中时,我们说它们具有线性或顺序关系。每个数据项都存储在相对于其他数据项的位置。在\tPython列表中,这些相对位置是单个项的索引值。由于这些索引值是有序的,我们可以按顺序访问它们。\n",
    "\n",
    "从列表中的第一个项目开始,我们按照基本的顺序排序,简单地从一个项移动到另一个项,直到找到我们正在寻找的项或遍历完整个列表。如果我们遍历完整个列表,则说明正在搜索的项不存在"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def sequentialSearch(alist,item):\n",
    "    i = 0\n",
    "    while i<len(alist):\n",
    "        if alist[i] == item:return True\n",
    "        i += 1\n",
    "    return False\n",
    "\n",
    "sequentialSearch([10,14,19,26,27,31,33,35,42,44],33)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二分查找 O(logn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"image/chapter04/binarySearch.gif\" width=400>\n",
    "\n",
    "在顺序查找中,当我们与第一个项进行比较时,如果第一个项不是我们要查找的,则最多还有n-1个项目。对于一个有序表，二分查找从中间项开始,而不是按顺序查找列表。如果该项是我们正在寻找的项,我们就完成了查找。如果它不是,我们可以使用列表的有序性质来消除剩余项的一半。如果我们正在查找的项大于中间项,就可以消除中间项以及比中间项小的一半元素。如果该项在列表中,肯定在大的那半部分。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def binarySearch(alist,item):\n",
    "    low , high = 0,len(alist)-1\n",
    "    while low <= high:\n",
    "        mid = (low+high) // 2\n",
    "        if item == alist[mid]:return True\n",
    "        if item < alist[mid]:high=mid - 1\n",
    "        else:low = mid + 1\n",
    "            \n",
    "    return False\n",
    "\n",
    "binarySearch([15,17,18,22,35,51,60,88,93],18)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 排序算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 冒泡排序 O(n^2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"image/chapter04/bubbleSort.gif\" width=600>\n",
    "\n",
    "冒泡排序需要<span class=\"girk\">多次</span>遍历列表。它比较相邻的项并交换那些无序的项。<span class=\"mark\">每次遍历列表将下一个最大的值放在其正确的位置</span>。实质上,每个项“冒泡”到它所属的位置。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### 冒泡法总是不断地前进，每次前进时它都会将该次遍历的最大值靠右放，因而它前进的路程会越来越短。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 44, 38, 5, 47, 15, 36, 26, 27, 2, 46, 4, 19, 50, 48]\n",
      "[2, 3, 4, 5, 15, 19, 26, 27, 36, 38, 44, 46, 47, 48, 50]\n"
     ]
    }
   ],
   "source": [
    "def bubbleSort(alist):\n",
    "    \n",
    "    for N in range(len(alist),1,-1):\n",
    "        for j in range(1,N):\n",
    "            if alist[j-1]>alist[j]:\n",
    "                alist[j-1],alist[j] = alist[j],alist[j-1]      \n",
    "    return alist\n",
    "\n",
    "testList = [3,44,38,5,47,15,36,26,27,2,46,4,19,50,48]\n",
    "print(testList)\n",
    "print(bubbleSort(testList))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 选择排序 O(n^2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"image/chapter04/selectSort.gif\" width=600>\n",
    "\n",
    "选择排序改进了冒泡排序,每次遍历列表只做一次交换。为了做到这一点,一个选择排序在他遍历时寻找最小的值,并在完成遍历后,将其放置在正确的位置。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 44, 38, 5, 47, 15, 36, 26, 27, 2, 46, 4, 19, 50, 48]\n",
      "[2, 3, 4, 5, 15, 19, 26, 27, 36, 38, 44, 46, 47, 48, 50]\n"
     ]
    }
   ],
   "source": [
    "def selectSort(alist):\n",
    "    for N in range(len(alist)):\n",
    "        k = N\n",
    "        for i in range(N+1,len(alist)):\n",
    "            if alist[i]<alist[k]:k=i\n",
    "        alist[N],alist[k] = alist[k],alist[N]\n",
    "    \n",
    "    return alist\n",
    "\n",
    "testList = [3,44,38,5,47,15,36,26,27,2,46,4,19,50,48]\n",
    "print(testList)\n",
    "print(selectSort(testList))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "你可能会看到选择排序与冒泡排序有相同数量的比较,因此也是O(n^2)。\t然而,由于交换数量的减少,选择排序通常在基准研究中执行得更快。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 插入排序 O(n^2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"image/chapter04/insertSort.gif\" width=600>\n",
    "\n",
    "插入排序,尽管仍然是O(n^2),工作方式略有不同。它始终在列表的较低位置维护一个排序的子列表。然后将每个新项“插入”回先前的子列表,使得排序的子列表称为较大的一个项。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 44, 38, 5, 47, 15, 36, 26, 27, 2, 46, 4, 19, 50, 48]\n",
      "[2, 3, 4, 5, 15, 19, 26, 27, 36, 38, 44, 46, 47, 48, 50]\n"
     ]
    }
   ],
   "source": [
    "def inserSort(alist):\n",
    "    \n",
    "    for i in range(1,len(alist)):\n",
    "        value = alist[i]\n",
    "        while i>0 and value<alist[i-1]:\n",
    "            alist[i] = alist[i-1]\n",
    "            i -= 1\n",
    "        alist[i] = value\n",
    "        \n",
    "    return alist\n",
    "\n",
    "testList = [3,44,38,5,47,15,36,26,27,2,46,4,19,50,48]\n",
    "print(testList)\n",
    "print(inserSort(testList))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 归并排序 O(nlogn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"image/chapter04/mergeSort.gif\" width=600>\n",
    "\n",
    "归并排序是一种递归算法,不断将列表拆分为一半。如果列表为空或有一个项,则按定义(基本情况)进行排序。如果列表有多个项,我们分割列表,并递归调用两个半部分的合并排序。归并排序（MERGE-SORT）是利用归并的思想实现的排序方法，该算法采用经典的分治（divide-and-conquer）策略（分治法将问题分(divide)成一些小的问题然后递归求解，而治(conquer)的阶段则将分的阶段得到的各答案\"修补\"在一起，即分而治之)。\n",
    "\n",
    "一旦对这两半排序完成,就执行称为合并的基本操作。合并是获取两个较小的排序列表并将它们组合成单个排序的新列表的过程\n",
    "\n",
    "<img src=\"image/chapter04/Screenshot from 2018-03-31 14-47-16.png\" width=350>\n",
    "\n",
    "可以看到这种结构很像一棵完全二叉树，本文的归并排序我们采用递归去实现（也可采用迭代的方式去实现）。分阶段可以理解为就是递归拆分子序列的过程，递归深度为log2n。\n",
    "\n",
    "再来看看治阶段，我们需要将两个已经有序的子序列合并成一个有序序列，比如上图中的最后一次合并，要将[4,5,7,8]和[1,2,3,6]两个已经有序的子序列，合并为最终序列[1,2,3,4,5,6,7,8]，来看下实现步骤。\n",
    "\n",
    "\n",
    "<img src=\"image/chapter04/Screenshot from 2018-03-31 14-50-11.png\" width=350>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 44, 38, 5, 47, 15, 36, 26, 27, 2, 46, 4, 19, 50, 48]\n",
      "[2, 3, 4, 5, 15, 19, 26, 27, 36, 38, 44, 46, 47, 48, 50]\n"
     ]
    }
   ],
   "source": [
    "def mergeSort(alist):\n",
    "    mid = len(alist)//2                     # Midpoint for division\n",
    "    left, right = alist[:mid], alist[mid:]            \n",
    "    if len(left)>1 :left = mergeSort(left)\n",
    "    if len(right)>1:right = mergeSort(right)# Sort by havles\n",
    "        \n",
    "    temp = []\n",
    "    while left and right:               # Neither half is empty\n",
    "        if left[-1] >= right[-1]:       # Left has greatest last value\n",
    "            temp.append(left.pop())     # Append it\n",
    "        else:                           # Right has greatest last value\n",
    "            temp.append(right.pop())    # Append it\n",
    "            \n",
    "    temp.reverse()                      # Result is backward\n",
    "    return (left or right) + temp       # Also add remainder\n",
    "    \n",
    "testList = [3,44,38,5,47,15,36,26,27,2,46,4,19,50,48]\n",
    "print(testList)\n",
    "print(mergeSort(testList))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 快速排序 O(nlogn)~O(n^2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"image/chapter04/quickSort.gif\" width=600>\n",
    "\n",
    "快速排序由C. A. R. Hoare在1962年提出。它的基本思想是：通过一个选定的基准值将要排序的数据分割成独立的两部分，一部分的所有数据要比该基准值小，而另一部分的数据都要比该基准值大。然后再按此方法对这两部分数据分别进行快同样的操作，整个排序过程可以递归进行，以此达到整个数据变成有序序列。快排还被人们称为”20世纪最具影响力的十大算法之一“."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 44, 38, 5, 47, 15, 36, 26, 27, 2, 46, 4, 19, 50, 48]\n",
      "[2, 3, 4, 5, 15, 19, 26, 27, 36, 38, 44, 46, 47, 48, 50]\n"
     ]
    }
   ],
   "source": [
    "def quickSort(alist):\n",
    "    if len(alist) <= 1:return alist\n",
    "    pi,alist = alist[0],alist[1:]\n",
    "    lo = [x for x in alist if x<=pi]\n",
    "    hi = [x for x in alist if x>pi]\n",
    "    return quickSort(lo) + [pi] + quickSort(hi)\n",
    "\n",
    "testList = [3,44,38,5,47,15,36,26,27,2,46,4,19,50,48]\n",
    "print(testList)\n",
    "print(quickSort(testList))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 侏儒排序 O(n^2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "它很像冒泡，也是通过元素交换完成的。这个算法你初看起来只有一层循环，但是实际上它的游标却并不总是往前进的，有时需要<span class=\"mark\">后退</span>，而<span class=\"girk\">冒泡却总是不断前进的，只是前进的路程越来越短</span>。这个算法总是能够找到第一组大小顺序错误的毗邻元素，然后交换它们。但是交换这两个元素的时候，又可能会带来新的顺序错误的毗邻元素对，所以在交换元素之后需要重新检查影响到的毗邻元素。\n",
    "\n",
    "<img src=\"image/chapter04/gnomesort.gif\" width=250>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 算法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 44, 38, 5, 47, 15, 36, 26, 27, 2, 46, 4, 19, 50, 48]\n",
      "[2, 3, 4, 5, 15, 19, 26, 27, 36, 38, 44, 46, 47, 48, 50]\n"
     ]
    }
   ],
   "source": [
    "def gnomeSort(alist):\n",
    "    i = 0\n",
    "    while i<len(alist):\n",
    "        if i==0 or alist[i-1] <= alist[i]:\n",
    "            i += 1\n",
    "        else:\n",
    "            # 如果顺序有无误，则交换顺序\n",
    "            alist[i],alist[i-1] = alist[i-1],alist[i]\n",
    "            i -= 1      # 然后后退，看看下一对的顺序是否正确\n",
    "            \n",
    "    return alist\n",
    "\n",
    "testList = [3,44,38,5,47,15,36,26,27,2,46,4,19,50,48]\n",
    "print(testList)\n",
    "print(gnomeSort(testList))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 总结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 1> 对于有序和无序列表,顺序搜索是\tO(n)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2> 冒泡排序,选择排序,侏儒排序和插入排序是\tO(n^2\t)算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 3> 归并排序是\tO(nlog^n\t),但是合并过程需要额外的空间"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 4> 快速排序是\tO(nlog^n),但如果分割点不在列表中间附近,可能会降级到O(n^2),不需要额外的空间。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
